A Language-Independent Approach to European Text Retrieval
نویسندگان
چکیده
We present an approach to multilingual information retrieval that does not depend on the existence of specific linguistic resources such as stemmers or thesaurii. Using the HAIRCUT system we participated in the monolingual, bilingual, and multilingual tasks of the CLEF-2000 evaluation. Our method, based on combining the benefits of words and character n-grams, was effective for both language-independent monolingual retrieval as well as for cross-language retrieval with translated queries. After describing our monolingual retrieval approach we compare a translation method using aligned parallel corpora to commercial machine translation software.
منابع مشابه
Language-Dependent and Language-Independent Approaches to Cross-Lingual Text Retrieval
We investigates the effectiveness of language-dependent approaches to document retrieval, such as stemming and decompounding, and constrast them with language-independent approaches, such as character n-gramming. In order to reap the benefits of more than one type of approach, we also consider the effectiveness of the combination of both types of approaches. We focus on document retrieval in ni...
متن کاملUsing Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents
Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...
متن کاملUsing Eurowordnet in a Concept-Based Approach to Cross-Language Text Retrieval
We present an approach to cross± language text retrieval based on the EuroWordNet (EWN)multilingual semantic database. EuroWordNet is a multilingual,WordNet ± like database with basic semantic relations between words for several European languages (English, Dutch, Spanish, Italian, German, French, Czech, and Estonian). In addition to the relations in WordNet 1.5, EWN includes domain labels, cro...
متن کاملUsing Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents
Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...
متن کاملUsing Text Surrounding Method to Enhance Retrieval of Online Images by Google Search Engine
Purpose: the current research aimed to compare the effectiveness of various tags and codes for retrieving images from the Google. Design/methodology: selected images with different characteristics in a registered domain were carefully studied. The exception was that special conceptual features have been apportioned for each group of images separately. In this regard, each group image surr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000